Methods for Intrinsic Plagiarism Detection and Author Diarization

نویسندگان

  • Mikhail Kuznetsov
  • Anastasia Motrenko
  • Rita Kuznetsova
  • Vadim Strijov
چکیده

The paper investigates methods for intrinsic plagiarism detection and author diarization. We developed a plagiarism detection method based on constructing an author style function from features of text sentences and detecting outliers. We adapted the method for the diarization problem by segmenting author style statistics on text parts, which correspond to different authors. Both methods were tested on the PAN-2011 collection for the intrinsic plagiarism detection and implemented for the PAN-2016 competition on author diarization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intrinsic Detection of Plagiarism based on Writing Style Grouping

In this paper, we tackle the task of intrinsic plagiarism detection, also referred to as author diarization. This task deals with identifying segments within a document written by multiple authors [2]. The main goal is to discover deviations in the writing style, looking for parts of the document that could potentially be written by another person [4]. In this paper, we present our hybrid appro...

متن کامل

Overview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre persp...

متن کامل

Clustering by Authorship Within and Across Documents

The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estim...

متن کامل

Intrinsic Plagiarism Detection Using Character n-gram Profiles

The task of intrinsic plagiarism detection deals with cases where no reference corpus is available and it is exclusively based on stylistic changes or inconsistencies within a given document. In this paper a new method is presented that attempts to quantify the style variation within a document using character n-gram profiles and a style change function based on an appropriate dissimilarity mea...

متن کامل

Approaches for Intrinsic and External Plagiarism Detection - Notebook for PAN at CLEF 2011

Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document, and external strategies, considering comparison techniques between a suspicious document and different sources. In this work, both intrinsic and external approaches for plagiarism detection are presented. First, the m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016